Skip to content

Add configuration file#48

Merged
jhamman merged 9 commits into
dask:masterfrom
mrocklin:config
May 17, 2018
Merged

Add configuration file#48
jhamman merged 9 commits into
dask:masterfrom
mrocklin:config

Conversation

@mrocklin

Copy link
Copy Markdown
Member

In dask/dask#3432 we're considering centralizing configuration in Dask projects. This PR is a proof of concept for what that would mean within a project like dask-jobqueue. This could use feedback both from dask/dask-jobqueue maintainers on what would be most comfortable for this user community, and also from dask/dask maintainers on how this constrains central development.

This adds a new jobqueue.yaml config file that allows users to specify
configuration in a file and have that be used across all of their use of the
dask-jobqueue project. This is also something that might allow IT to help
standardize configuration options for a given cluster.

@mrocklin mrocklin mentioned this pull request Apr 30, 2018
3 tasks
@jhamman

jhamman commented Apr 30, 2018

Copy link
Copy Markdown
Member

I'm generally a fan of this approach. Without reading all of the dask PR, do I understand correctly that these config options could go in the .dask/config.yml? I really like the idea of having a centralized place to control these options. Save a discussion on the default values, I don't have any objection to this concept.

@mrocklin

Copy link
Copy Markdown
Member Author

They could. They could also go into a special jobqueue.yaml. Also, we're proposing moving the config directory from ~/.dask/config.yaml to ~/.config/dask/*.yaml. On first import we'll comment out and copy the jobqueue.yaml file present in this repository to that directory. A system administrator could also provide a subset of these arguments in /etc/dask/ if desired. This would help IT or a group leader to specify these settings for many people without them having to do it themselves.

@guillaumeeb

Copy link
Copy Markdown
Member

This looks good! Adding a way to set standard configuration without breaking current use.
It is maybe too open on the file names, but I guess it is simpler this way, in charge of admin or users not to mess with same parameter in different files.

How do you deploy the default jobqueue.yml file if it is not present?
How do you deal with incomplete yml file? config parameters are overloaded with the default one?

@mrocklin

mrocklin commented Apr 30, 2018 via email

Copy link
Copy Markdown
Member Author

@guillaumeeb

Copy link
Copy Markdown
Member

Hmm, I probably missed something, but I still don't see where the dask_jobqueue/jobqueue.yaml is used or copied by the dask.config submodule.
Although I expect the default config file to never take precedence over other ones, I do not see right know how it works. Sorry, I've only taken a quick glance over the PR.

@mrocklin

mrocklin commented May 4, 2018

Copy link
Copy Markdown
Member Author

Hmm, I probably missed something, but I still don't see where the dask_jobqueue/jobqueue.yaml is used or copied by the dask.config submodule.

Right, sorry, I missed the config.py file. This should be more clear now. There is also now documentation for downstream libraries in https://github.com/mrocklin/dask/blob/c40d27a6cb606bbee6200a2bf6eba1869816398d/docs/source/configuration.rst

So, the dask config PR will probably go in sometime tomorrow. If we merge this PR as-is it will only work with the dask master version. I think that we have a few options:

  1. Don't merge this until the next dask release (probably a few weeks away)
  2. Add logic to this library so that it works regardless of the dask version
  3. Release dask-jobqueue now, and then merge this as-is and say that the master version of dask-jobqueue depends on the master version of dask

I'm somewhat inclined towards option 3, and averse to option 2, happy to go either way though.

@mrocklin

mrocklin commented May 6, 2018

Copy link
Copy Markdown
Member Author

OK, it looks like dask-jobqueue was released quite recently. Almost no new features have been added since @jhamman released two weeks ago. I'm going to go ahead and merge this once upstream libraries are ready and we pass tests here if there are no objections.

@mrocklin mrocklin force-pushed the config branch 2 times, most recently from de0ffed to 8a09d49 Compare May 6, 2018 13:15

@guillaumeeb guillaumeeb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with option 3 or 1. Do you plan to add something about dask master branch dependency in the documentation?
Would this be a general rule for dask related modules: master depend on master?

@mrocklin

mrocklin commented May 6, 2018

Copy link
Copy Markdown
Member Author

Would this be a general rule for dask related modules: master depend on master?

We try to provide some leeway when possible, but it's common to have master-dependence when we're about to make 0.X.0 version changes.

@mrocklin

mrocklin commented May 6, 2018

Copy link
Copy Markdown
Member Author

SGE tests are failing in a way that I don't currently understand. Trying to reproduce it locally but running into #53

@mrocklin mrocklin changed the title [RFC] Add configuration file Add configuration file May 7, 2018
@mrocklin

mrocklin commented May 7, 2018

Copy link
Copy Markdown
Member Author

Issues resolved. This now works for me. Critical feedback on this is welcome.

@mrocklin

mrocklin commented May 7, 2018

Copy link
Copy Markdown
Member Author

Live testing is also quite welcome. I haven't actually used this on a live cluster. It would be good to get usability feedback from anyone who uses dask-jobqueue semi-regularly.

@lesteve

lesteve commented May 7, 2018

Copy link
Copy Markdown
Member

Just curious (I don't know very much about cluster admin), what would the recommended approach for cluster admins? Would it be to drop the same yaml file in /etc/dask on all the nodes of the cluster?

@mrocklin

mrocklin commented May 8, 2018

Copy link
Copy Markdown
Member Author

Yes, I think that you would want to drop the subset of the configuration that made sense for all users into /etc/dask

# /etc/dask
jobqueue:
  name: dask-worker
  threads: 8
  processes: 4
  memory: 192GB  # should we change this to be for the full job?
  interface: ib0
  local-directory: /local/scratch/$USER  # should we support environment variables?
  queue: standard
  sge:
    resource-spec: ...

Then they can leave the other parts for users, or users can overwrite elements if they wish (user directories are given more precedence). The dask.config module will handle merging things.

# ~/.config/dask/jobqueue.yaml
jobqueue:
  queue: priority
  project: PB382938

@jhamman jhamman left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me. I have not tried this out yet and wont be able to for a few days so if others are pleased with how it works, I'm fine to see it merged.

@mrocklin

mrocklin commented May 9, 2018 via email

Copy link
Copy Markdown
Member Author

@jhamman

jhamman commented May 10, 2018

Copy link
Copy Markdown
Member

I gave this a try today. Things were not as smooth as I had hoped (probably user error). Here's a screen grab:

image

The details:

  • most recent release of xarray/jupyter
  • master branches of dask/distributed
  • this PR's branch of jobqueue
  • edited only ~/.config/dask/jobqueue.yaml

What worked:

  • The config files magically appeared in my .config directory
  • I was able to edit the files and start a cluster/client

What didn't work:

  • Edits to the config files did not modify the default values in the PBSCluster

@mrocklin

Copy link
Copy Markdown
Member Author

Thanks for trying things out @jhamman . If you're around and have any interest in trying things out on a screenshare I'd be interested. If not then can I ask you for the value of dask.config.config?

@jhamman

jhamman commented May 14, 2018

Copy link
Copy Markdown
Member

Sure, happy to walk you through this. Let's connect in the next couple days. I'm traveling today.

@mrocklin

Copy link
Copy Markdown
Member Author

I'm around most of today and tomorrow. Ping me on gchat when convenient?

@mrocklin

Copy link
Copy Markdown
Member Author

@jhamman I've reproduced your error, identified the bug, and fixed it in dask/dask#3500

Thank you for testing and reporting.

@jhamman

jhamman commented May 15, 2018

Copy link
Copy Markdown
Member

@mrocklin, with dask/dask#3500, this seems to be working quite well. Thanks for working out the remaining issues without me :)

@mrocklin

mrocklin commented May 15, 2018 via email

Copy link
Copy Markdown
Member Author

@jhamman

jhamman commented May 15, 2018

Copy link
Copy Markdown
Member

@kmpaul and @davide - What's the best way to get CISL to put some generic dask configs in /etc/dask on Cheyenne? I'd suggest something quite generic:

jobqueue:
  name: dask-worker
  threads: 4
  processes: 1
  memory: 3GB
  interface: ib0
  death-timeout: 60
  local-directory: null
  # extra: ""
  # env-extra: []

  queue: share
  walltime: '01:00:00'

  pbs:
    resource-spec: select=1:ncpus=4
    # job-extra: []

This would just require users to just specify a project number to get started...

from dask_jobqueue import PBSCluster
from distributed import Client

cluster = PBSCluster(project='ABC')
client = Client(cluster)
cluster.scale_up(4)

@jhamman

jhamman commented May 15, 2018

Copy link
Copy Markdown
Member

@mrocklin - once dask/dask#3500 is merged, I'd like to merge this as well. You just have a few merge conflicts to sort out before then.

@guillaumeeb guillaumeeb mentioned this pull request May 15, 2018
@guillaumeeb

Copy link
Copy Markdown
Member

Trying this, but I must be doing something wrong. Here's what I've done in my conda env:

pip install git+https://github.com/dask/dask.git
pip install git+https://github.com/dask/distributed.git
pip install git+https://github.com/mrocklin/dask-jobqueue.git@config

What I get:

In [2]: !ls /home/eh/eynardbg/.config
abrt  dask  dconf  enchant  fbpanel  gconf  geany  gedit  gtk-2.0  matplotlib  menus  nautilus  pip  Trolltech.conf

In [3]: !ls /home/eh/eynardbg/.config/dask

In [4]: from dask_jobqueue import PBSCluster
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-4-d1897891001f> in <module>()
----> 1 from dask_jobqueue import PBSCluster

/work/ADM/hpc/eynardbg/Outils/miniconda3/envs/dask_dev/lib/python3.5/site-packages/dask_jobqueue/__init__.py in <module>()
      1 # flake8: noqa
----> 2 from . import config
      3 from .core import JobQueueCluster
      4 from .pbs import PBSCluster
      5 from .slurm import SLURMCluster

/work/ADM/hpc/eynardbg/Outils/miniconda3/envs/dask_dev/lib/python3.5/site-packages/dask_jobqueue/config.py in <module>()
      8
      9 fn = os.path.join(os.path.dirname(__file__), 'jobqueue.yaml')
---> 10 dask.config.ensure_file(source=fn)
     11
     12 with open(fn) as f:

/work/ADM/hpc/eynardbg/Outils/miniconda3/envs/dask_dev/lib/python3.5/site-packages/dask/config.py in ensure_file(source, destination, comment)
    186         # destination while another process reads an empty config file.
    187         tmp = '%s.tmp.%d' % (destination, os.getpid())
--> 188         with open(source) as f:
    189             lines = list(f)
    190

FileNotFoundError: [Errno 2] No such file or directory: '/work/ADM/hpc/eynardbg/Outils/miniconda3/envs/dask_dev/lib/python3.5/site-packages/dask_jobqueue/jobqueue.yaml'
$ ls /work/ADM/hpc/eynardbg/Outils/miniconda3/envs/dask_dev/lib/python3.5/site-packages/dask_jobqueue
config.py  core.py  __init__.py  pbs.py  __pycache__  sge.py  slurm.py  _version.py

@mrocklin

mrocklin commented May 16, 2018

Copy link
Copy Markdown
Member Author

Ah, thank you for trying things and reporting back @guillaumeeb . I may have addressed this in 533c871 . Can you try again?

@guillaumeeb

Copy link
Copy Markdown
Member

Thanks for the fix!

So I am able to start a cluster and see the config files:

In [2]: !ls /home/eh/eynardbg/.config/dask
distributed.yaml  jobqueue.yaml

In [6]: cluster = PBSCluster()

In [8]: client = Client(cluster)

In [10]: client
Out[10]: <Client: scheduler='tcp://10.120.40.61:44310' processes=0 cores=0>

In [11]: cluster.start_workers(4)
Out[11]: [2, 3, 4, 5]

In [12]: client
Out[12]: <Client: scheduler='tcp://10.120.40.61:44310' processes=16 cores=32>

Then I tried to change the configuration, and set:

$ cat /home/eh/eynardbg/.config/dask/jobqueue.yaml
jobqueue:
#   name: dask-worker
#   threads: 2
  processes: 12
  memory: 10GB
#   interface: null
#   death-timeout: 60
#   local-directory: null
#   extra: ""
#   env-extra: []

#   queue: null
  project: DaskjobqueueTest
  walltime: '01:00:00'

  pbs:
    resource-spec: select=1:ncpus=24:mem=120GB
#     job-extra: []

I did not refresh the config at first, so obviously it did not work, but then I found the correct method into dask.config thanks to the dask documentation in your PR. Maybe there should be a refresh method on every config object?

Anyway I called it, and tried to launch another cluster:

In [35]: dask.config.config
Out[35]:
(...)
 'jobqueue': {'death-timeout': 60,
  'env-extra': [],
  'extra': '',
  'interface': None,
  'local-directory': None,
  'memory': '8GB',
  'name': 'dask-worker',
  'pbs': {'job-extra': [], 'resource-spec': None},
  'processes': 4,
  'project': None,
  'queue': None,
  'sge': {'resource-spec': None},
  'slurm': {'job-cpu': None, 'job-extra': {}, 'job-mem': None},
  'threads': 2,
  'walltime': '00:30:00'},
(...)

In [38]: dask.config.refresh()

In [39]: dask.config.config.get('jobqueue')
Out[39]:
{'death-timeout': 60,
 'env-extra': [],
 'extra': '',
 'interface': None,
 'local-directory': None,
 'memory': '10GB',
 'name': 'dask-worker',
 'pbs': {'job-extra': [], 'resource-spec': 'select=1:ncpus=24:mem=120GB'},
 'processes': 12,
 'project': 'DaskjobqueueTest',
 'queue': None,
 'sge': {'resource-spec': None},
 'slurm': {'job-cpu': None, 'job-extra': {}, 'job-mem': None},
 'threads': 2,
 'walltime': '01:00:00'}

In [41]: cluster = PBSCluster()

In [42]: client = Client(cluster)

In [43]: client
Out[43]: <Client: scheduler='tcp://10.120.40.61:43734' processes=0 cores=0>

In [44]: cluster.start_workers(2)
Out[44]: [2, 3]

In [45]: client
Out[45]: <Client: scheduler='tcp://10.120.40.61:43734' processes=8 cores=16>

In [46]: dask.config.get('jobqueue.processes')
Out[46]: 12

In [47]:  print(cluster.job_script())
    ...:
#!/bin/bash

#PBS -N dask-worker
#PBS -l select=1:ncpus=8:mem=30GB
#PBS -l walltime=00:30:00



/work/ADM/hpc/eynardbg/Outils/miniconda3/envs/dask_dev/bin/dask-worker tcp://10.120.40.61:43734 --nthreads 2 --nprocs 4 --memory-limit 8GB --name dask-worker-4 --death-timeout 60

The new config don't seem to be taken into account by the newly created PBSCluster.

Restarting the ipython process is OK though, so this may be expected:

In [2]: from distributed import Client

In [3]: from dask_jobqueue import PBSCluster

In [4]: cluster = PBSCluster()

In [5]: client = Client(cluster)

In [6]: client
Out[6]: <Client: scheduler='tcp://10.120.40.61:57283' processes=0 cores=0>

In [7]:  print(cluster.job_script())
#!/bin/bash

#PBS -N dask-worker
#PBS -A DaskjobqueueTest
#PBS -l select=1:ncpus=24:mem=120GB
#PBS -l walltime=01:00:00

/work/ADM/hpc/eynardbg/Outils/miniconda3/envs/dask_dev/bin/dask-worker tcp://10.120.40.61:57283 --nthreads 2 --nprocs 12 --memory-limit 10GB --name dask-worker-2 --death-timeout 60

One last thing, it is a bit weird to find the config file all commented by default, but I understand it is required for other mecanisms to work.

@mrocklin

Copy link
Copy Markdown
Member Author

Thanks for going through things @guillaumeeb !

Right, so currently we check the config at import time in code like the following:

    def __init__(self,
                 name=dask.config.get('jobqueue.name'),
                 threads=dask.config.get('jobqueue.threads'),
                 processes=dask.config.get('jobqueue.processes'),
                 memory=dask.config.get('jobqueue.memory'),
                 interface=dask.config.get('jobqueue.interface'),
                 death_timeout=dask.config.get('jobqueue.death-timeout'),
                 local_directory=dask.config.get('jobqueue.local-directory'),
                 extra=dask.config.get('jobqueue.extra'),
                 env_extra=dask.config.get('jobqueue.env-extra'),
                 **kwargs
                 ):

So if you update the config then our defaults won't update.

Instead, we could do something like the following:

def __init__(self,
             name=None,
             threads=None,
             processes=None,
             ...
             ):

    if name is None:
        name = dask.config.get('jobqueue.name')
    if threads is None:
        threads = dask.config.get('jobqueue.threads')
    ...

You would still have to explicitly call dask.config.refresh() but things would work.

I think that the active question here is how much we care about refreshing the configuration within a single session, vs the code cleanliness of keeping config items in the function definition. I think that this choice depends on how frequently people change their configuration.

@guillaumeeb

Copy link
Copy Markdown
Member

I think that the active question here is how much we care about refreshing the configuration within a single session, vs the code cleanliness of keeping config items in the function definition. I think that this choice depends on how frequently people change their configuration.

I agree, and I tend to prefer the code cleanliness! Configuration should not change a lot, and in this case one should use directly keyword args.
My only remaining concern is the usefulness of dask.config.refresh() in this case?

@mrocklin

Copy link
Copy Markdown
Member Author

My only remaining concern is the usefulness of dask.config.refresh() in this case?

Yeah, not very useful in this particular case. Different subprojects may make different choices here. It's not yet clear what common practice will be.

@guillaumeeb

Copy link
Copy Markdown
Member

Maybe you should add something in the dask documentation for the configuration refresh part. Telling this is not always effective, and the safer way is to restart the python process?

@jhamman

jhamman commented May 16, 2018

Copy link
Copy Markdown
Member

Just to say, I much prefer the current behavior where the defaults are determined at import. I like this because it then gives function signatures (tab expansion in jupyter notebooks for instance) that reflect the config defaults. These are a useful template for new (and old) users.

@mrocklin

Copy link
Copy Markdown
Member Author

OK, it sounds like we resolve this with @guillaumeeb 's suggestion to add to the documentation and docstring.

@mrocklin

Copy link
Copy Markdown
Member Author

Are there other things to address here? Should we start depending on git-master in tests and merge this? Should we wait for more people to test it?

@guillaumeeb

Copy link
Copy Markdown
Member

Hm, I don't quite believe you will have other testers 🙂 ... Maybe @lesteve if he's around tomorrow?

I'm fine with merging this.

@jhamman

jhamman commented May 16, 2018

Copy link
Copy Markdown
Member

I'm also fine with merging now if @lesteve doesn't have any specific comments.

mrocklin added 9 commits May 16, 2018 16:58
This builds on dask/dask#3432 to centralize
configuration within Dask.

This adds a new jobqueue.yaml config file that allows users to specify
configuration in a file and have that be used across all of their use of the
dask-jobqueue project.  This is also something that might allow IT to help
standardize configuration options for a given cluster.
@jhamman

jhamman commented May 16, 2018

Copy link
Copy Markdown
Member

Quick question that just occurred to me, once merged, is this going to pin us to dask >= 0.18? Or does dask.config.get() already exist to some extent?

@mrocklin

mrocklin commented May 16, 2018 via email

Copy link
Copy Markdown
Member Author

@mrocklin

Copy link
Copy Markdown
Member Author

Merging this in four hours if there are no further comments.

@jhamman jhamman merged commit 24d1343 into dask:master May 17, 2018
@lesteve

lesteve commented May 18, 2018

Copy link
Copy Markdown
Member

Just one comment:
I find the default argument at import-time thing a bit confusing even if people may not update their configuration very often. For example in a long-living session like a notebook, I would expect the following behaviour:

from 

from dask_jobqueue import FooCluster

cluster = FooCluster(...)

# edit the config file
cluster = FooCluster(...)  # new settings should be taken into account

I agree that the signature with foo=None, bla=None,... is not great, maybe there is a way to improve the situation. Not sure what would be a good way to do that, maybe documenting the default in the docstring is a start.

@mrocklin mrocklin deleted the config branch May 18, 2018 12:58
@mrocklin

Copy link
Copy Markdown
Member Author

I agree that this would be nice behavior to have. I'm not sure how best to accomplish it. Suggestions here would be welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants